A comprehensive resource for Bordetella genomic epidemiology and biodiversity studies

The genus Bordetella includes bacteria that are found in the environment and/or associated with humans and other animals. A few closely related species, including Bordetella pertussis, are human pathogens that cause diseases such as whooping cough. Here, we present a large database of Bordetella isolates and genomes and develop genotyping systems for the genus and for the B. pertussis clade. To generate the database, we merge previously existing databases from Oxford University and Institut Pasteur, import genomes from public repositories, and add 83 newly sequenced B. bronchiseptica genomes. The public database currently includes 2582 Bordetella isolates and their provenance data, and 2085 genomes (https://bigsdb.pasteur.fr/bordetella/). We use core-genome multilocus sequence typing (cgMLST) to develop genotyping systems for the whole genus and for B. pertussis, as well as specific schemes to define antigenic, virulence and macrolide resistance profiles. Phylogenetic analyses allow us to redefine evolutionary relationships among known Bordetella species, and to propose potential new species. Our database provides an expandable resource for genotyping of environmental and clinical Bordetella isolates, thus facilitating evolutionary and epidemiological research on whooping cough and other Bordetella infections.

The tree was made using the neighbor-joining method within BIGSdb and was midpoint rooted and annotated using iTOL, called directly from the BIGSdb platform. Public genomes were used, but only some representatives of the BbGS were selected (same as those in Figure

Supplementary notes: Variation of vaccine antigens and virulenceassociated genes
The main alleles of the loci of these schemes are summarized in Supplementary Data 6. We here describe the presence or absence of these genes, and the distribution of their alleles, within B. bronchiseptica (Bbs) and B. pertussis (Bp).

Variation within B. bronchiseptica
Supplementary Fig. 5 provides a visual illustration of allele variation of the different genotyping schemes presented below.

2400_5550)
Even though pertussis toxin (PT) is known to be only produced by Bp isolates, the loci for PT promoter and subunits (ptxP, ptxA, ptxB, ptxC, ptxD, and ptxE) were present across most of the BbGS members. When present, alleles of these loci were specific for each lineage, i.e., uniquely observed within a single lineage.
A notable exception was observed for 58 of 63 isolates from sublineage I-4, in which no alleles of the 6 PT-related loci were detected. This is in agreement with previous results underlying ptxABCDE absence in Bbr77 and MO149 isolates from sublineage I-4 4 .
Furthermore, we noticed that isolates of Bbs lineage II did not have alleles for the ptxP locus.
This was true with the initial scan parameters (i.e. 90% identity, 90% alignment with the type allele) as well as when releasing these scan parameters to 50%. Regarding fimbriae, alleles were only detected in lineage I-2 (Bp) for locus fim2 using our standard parameters. However, when releasing the scan parameters, we obtained partial fim2 matches for 78 isolates of lineages I-1, I-3, I-4 and 2, with identity percentage ranging from 78% to 97%.
For locus fhaB (2400-5550 region), no alleles were captured for 71 isolates, 67 of them belonging to sublineage I-1. Alleles were missing in 16/67 genomes but a partial match (from 1591 bp to 3099 bp over 3151 bp) was found when releasing scan parameters for the remaining isolates (51/67), with a high identity score (from 98 to 100%).

Other toxins scheme
For the two loci cyaA and dnt, included in the "Other toxins" scheme, we observed either sequence polymorphism or variation in the presence of these genes.
Gene cyaA encodes adenylate cyclase, an important toxin produced by isolates of the BbGS 5,6 . No alleles could be assigned for 32 isolates of lineages I-1 and 6 isolates of lineage II, consistent with previous findings: for strain 253 (BIGSdb ID: 1036) of lineage I-1 6,7 , the cyaA gene is replaced by a ptp operon 6 . Interestingly, in 5/6 isolates from lineage II (as for strains HT200 8 , BIGSdb ID: 2432), no alleles were found even when releasing the scan parameters.
The dermonecrotic toxin (DNT) is involved in turbinate atrophy and bronchopneumonia in pigs 9 . All isolates collected from pigs in our dataset were found in sublineage I-1 and had alleles at the dnt locus (as for example in strain S798 used to investigate DNT expression in Vag8 is an additional autotransporter involved in complement evasion 20 . Alleles were captured for isolates of lineages I-4 and I-2 (Bp) but were missing in some isolates of lineage I-1 and in all isolates of lineage II.

T3SS scheme
The allele variation at loci of this scheme is defined in the main text.

Phase biology genes scheme
Phase variation due to mutations in either bvgA or bvgS is frequent in Bbs. We found that bvgA and bvgS alleles were highly specific for each lineage or sublineage.

Variation within B. pertussis
As expected, the diversity of alleles of the different schemes was much lower for B. pertussis than within the BbGS.

2400_5550)
The allele variation at loci of this scheme is defined in the main text.

Other toxins scheme
Bp isolates were characterized by cyaA allele 4, except for one isolate (ID 1240_B096) with allele 43. Eight isolates had no alleles but presented a partial match of 3402 to 3424 bp vs 5121 as expected with 100% identity to cyaA gene.

Autotransporters scheme
Two prn alleles were predominant (alleles 1 and 2) in our dataset; as we matched our allele nomenclature with denominations present in the literature, this is consistent with the wellknown shift from prn1 to prn2 that arose in the WCV period and increased in frequency in the ACV period 22 . We also detected 3 isolates with different alleles (prn-3, 7 and 9). In addition, because of the high prevalence of PRN-deficient isolates, often due to an incomplete prn CDS, a large proportion of isolates had no allele for prn, and these were mostly ptxP3.
Autotransporters genes bapC and vag8 were highly conserved, as all Bp isolates were characterized by allele 1 for bapC locus and allele 4 for vag8.
Most Bp isolates had allele 1 for brkA, except for one isolate with allele 64 (ID 12189, isolate B1816); 14 isolates had no called allele.
tcfA was more diverse: although most isolates had allele 2, six other alleles were found, in one isolate each (alleles 4, 5, 11, 12, 13 and 35) and nine isolates had no allele. Bp isolates with no tcfA because of the entire gene deletion have previously been reported 19 .

T3SS scheme
T3SS loci were highly conserved, and as a consequence T3ST-3 was found for most isolates. The main allele for bopB, bopD and bopN, bsp22 or betA loci was allele 1. Three isolates had allele 4 for bopB and one had allele 33 for bteA. In addition, 11 isolates had no allele for bteA. Bp isolates producing no bteA have already been evidenced 23 , such as FR0145 (ID 695) due to the insertion of an IS481 in its promoter region.
bipA allele was 3 in all Bp isolates and other alleles were captured for some Bbs isolates of sublineage I-1 (comprising RB50) consistent with previous findings 24 .